Lexical and Semantic Methods in Inner Text Topic Segmentation: A Comparison between C99 and Transeg

نویسندگان

  • Alexandre Labadié
  • Violaine Prince
چکیده

This paper present a semantic and syntactic distance based method in topic text segmentation and compare it to a very well known text segmentation algorithm: c99. To do so we ran the two algorithms on a corpus of twenty two French political discourses and compared their results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Text Segmentation Algorithms Gain from Topic Models

This paper introduces a general method to incorporate the LDA Topic Model into text segmentation algorithms. We show that semantic information added by Topic Models significantly improves the performance of two wordbased algorithms, namely TextTiling and C99. Additionally, we introduce the new TopicTiling algorithm that is designed to take better advantage of topic information. We show consiste...

متن کامل

Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation

In order to solve problems of reliability of systems based on lexical repetition and problems of adaptability of languagedependent systems, we present a context-based topic segmentation system based on a new informative similarity measure based on word co-occurrence. In particular, our evaluation with the state-of-the-art in the domain i.e. the c99 and the TextTiling algorithms shows improved r...

متن کامل

Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to...

متن کامل

Topic Segmentation with Hybrid Document Indexing

We present a domain-independent unsupervised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evaluate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses spectral embedding to estimate semantic association between content nouns over a span of multiple...

متن کامل

Improving Text Segmentation with Non-systematic Semantic Relation

Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008